Skip to content

use a more optimal loop implementation#131

Merged
febo merged 1 commit intosolana-program:mainfrom
nagisa:perffix
Mar 3, 2026
Merged

use a more optimal loop implementation#131
febo merged 1 commit intosolana-program:mainfrom
nagisa:perffix

Conversation

@nagisa
Copy link
Contributor

@nagisa nagisa commented Feb 26, 2026

In platform-tools v1.53 inlining decisions of Iterator::next prevent LLVM from optimizing this loop into an unrolled index-based version of it.

So where in 1.51 we'd see

INLINE_MEMMOVE($from, $to, 32 bytes)
jeq $length, 1, exit
INLINE_MEMMOVE($from, $to, 32 bytes)
jeq $length, 2, exit
...

In 1.53 we'd instead get a much more CU-heavy pointer-comparison based loop instead.

lsh64 $index, 3
ldxdw r4, [r1 + 0]
add64 r1, $index
INLINE_MEMMOVE($from, $to, 32)
ldxdw r6, [r10 - 48]
jeq r1, r2, exit
; ...

Since we want an index-based loop here for optimal results, might as well write it that way directly instead of relying on optimizations to trigger.

@nagisa
Copy link
Contributor Author

nagisa commented Feb 26, 2026

cc @LucasSte

Please test @febo

@LucasSte LucasSte requested a review from febo February 26, 2026 18:37
@LucasSte
Copy link
Contributor

cc @LucasSte

Please test @febo

On v1.51 I see 192 CUs (down from 193 comparing to master) and on v1.54 I see 186. Interesting, that there appears to be a regression in Rust, no?


for (i, signer_info) in remaining.iter().enumerate() {
multisig.signers[i] = *signer_info.key();
for i in 0..remaining.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to add #[allow(clippy::needless_range_loop)] so CI is happy.

@febo
Copy link
Contributor

febo commented Mar 2, 2026

@nagisa Looks good – thanks! We just need to make CI happy. 😊

In platform-tools v1.53 inlining decisions of Iterator::next prevent
LLVM from optimizing this loop into an unrolled index-based version of
it.

So where in 1.51 we'd see

```asm
INLINE_MEMMOVE($from, $to, 32 bytes)
jeq $length, 1, exit
INLINE_MEMMOVE($from, $to, 32 bytes)
jeq $length, 2, exit
...
```

In 1.53 we'd instead get a much more CU-heavy pointer-comparison based
loop instead.

```asm
lsh64 $index, 3
ldxdw r4, [r1 + 0]
add64 r1, $index
INLINE_MEMMOVE($from, $to, 32)
ldxdw r6, [r10 - 48]
jeq r1, r2, exit
; ...
```

Since we want an index-based loop here for optimal results, might as
well write it that way directly instead of relying on optimizations to
trigger.

Co-authored-by: Lucas Ste <38472950+LucasSte@users.noreply.github.com>
Copy link
Contributor

@febo febo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great – thanks!

@nagisa
Copy link
Contributor Author

nagisa commented Mar 3, 2026

You will have to merge it yourself, I don't have the perms.

@febo febo merged commit e7fe2e3 into solana-program:main Mar 3, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants